Anytime optimal MDP planning with trial-based heuristic tree search

نویسنده

Thomas Keller

چکیده

Planning and acting in a dynamic environment is a challenging task for an autonomous agent, especially in the presence of uncertain and exogenous effects, a large number of states, and a long-term planning horizon. In this thesis, we approach the problem by considering algorithms that interleave planning for the current state and execution of the taken decision. The main challenge of the agent is to use its tight deliberation time wisely. One solution are determinizations, which simplify the Markov Decision Process that describes the uncertain environment to a deterministic planning problem. We introduce an all-outcomes determinization where, unlike in comparable methods, the number of deterministic actions is not exponentially but polynomially bounded in the number of parallel probabilistic effects. We discuss three algorithms that base their decision solely on the solution to a determinization, and show that they have fundamental limitations that prevent optimal behavior even if provided with unlimited resources. The main contribution of this thesis, the Trial-based Heuristic Tree Search (THTS) framework, allows the description of algorithms in terms of only six ingredients that can be mixed and matched at will. We present a selection of ingredients and analyze theoretically which combinations yield asymptotically optimal behavior. Our implementation of the THTS framework, the probabilistic planner PROST, not only allows to evaluate all anytime optimal algorithms empirically on the benchmarks of the International Probabilistic Planning Competition (IPPC), but furthermore emphasizes the potential of THTS by being the back to back winner of the competition in 2011 and 2014. In the final chapter, we introduce the MDP-Evaluation Stopping Problem, the optimization problem faced by participants of IPPC 2014. We show how it can be constructed formally, discuss three special cases that are solvable in practice, and present approximate algorithms that are based on techniques that are derived from the solutions for the special cases. Finally, we show theoretically and empirically that all proposed algorithms improve significantly over the application of the state-of-the-art approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Stochastic Process Model of Classical Search

Among classical search algorithms with the same heuristic information, with sufficient memory A* is essentially as fast as possible in finding a proven optimal solution. However, in many situations optimal solutions are simply infeasible, and thus search algorithms that trade solution quality for speed are desirable. In this paper, we formalize the process of classical search as a metalevel dec...

متن کامل

An UCT Approach for Anytime Agent-Based Planning

In this paper, we introduce a new heuristic search algorithm based on mean values for anytime planning, called MHSP. It consists in associating the principles of UCT, a bandit-based algorithm which gave very good results in computer games, and especially in Computer Go, with heuristic search in order to obtain an anytime planner that provides partial plans before finding a solution plan, and fu...

متن کامل

Trial-Based Heuristic Tree Search for Finite Horizon MDPs

Dynamic programming is a well-known approach for solving MDPs. In large state spaces, asynchronous versions like Real-Time Dynamic Programming (RTDP) have been applied successfully. If unfolded into equivalent trees, Monte-Carlo Tree Search algorithms are a valid alternative. UCT, the most popular representative, obtains good anytime behavior by guiding the search towards promising areas of the...

متن کامل

Informed Asymptotically Optimal Anytime Search

Path planning in robotics often requires finding high-quality solutions to continuously valued and/or high-dimensional problems. These problems are challenging and most planning algorithms instead solve simplified approximations. Popular approximations include graphs and random samples, as respectively used by informed graph-based searches and anytime sampling-based planners. Informed graph-bas...

متن کامل

Search-Based Footstep Planning

Efficient footstep planning for humanoid navigation through cluttered environments is still a challenging problem. Often, obstacles create local minima in the search space, forcing heuristic planners such as A* to expand large areas. Furthermore, planning longer footstep paths often takes a long time to compute. In this work, we introduce and discuss several solutions to these problems. For nav...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Anytime optimal MDP planning with trial-based heuristic tree search

نویسنده

چکیده

منابع مشابه

A Stochastic Process Model of Classical Search

An UCT Approach for Anytime Agent-Based Planning

Trial-Based Heuristic Tree Search for Finite Horizon MDPs

Informed Asymptotically Optimal Anytime Search

Search-Based Footstep Planning

عنوان ژورنال:

اشتراک گذاری